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Abstract 

Background: Soybean {Glycine max L) is one of the most important oil crops in the world. It is desirable to 
increase oil yields from soybean, and so this has been a major goal of oilseed engineering. However, it is still 
uncertain how many genes and which genes are involved in lipid biosynthesis. 

Results: Here, we evaluated changes in gene expression over the course of seed development using lllumina 
(formerly Solexa) RNA-sequencing. Tissues at 15 days after flowering (DAF) served as the control, and a total of 
1 1592, 16594, and 16255 differentially expressed unigenes were identified at 35, 55, and 65 DAF, respectively. Gene 
Ontology analyses detected 113 co-expressed unigenes associated with lipid biosynthesis. Of these, 15 showed 
significant changes in expression levels (log 2 fold values > 1) during seed development. Pathway analysis revealed 
24 co-expressed transcripts involved in lipid biosynthesis and fatty acid biosynthesis pathways. We selected 12 
differentially expressed genes and analyzed their expressions using qRT-PCR. The results were consistent with those 
obtained from Solexa sequencing. 

Conclusion: These results provide a comprehensive molecular biology background for research on soybean seed 
development, particularly with respect to the process of oil accumulation. All of the genes identified in our research 
have significance for breeding soybeans with increased oil contents. 
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Background 

Soybean [Glycine max (L.) Merrill], one of the most im- 
portant oil crops in the world that accounts for less than a 
third (27.7%) of the total global vegetable oil production. 
At present, the main producers of soybean are the United 
States, Brazil, Argentina, India, and China. Although world 
soybean oil production has steadily increased, rising from 
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37.73 million metric tons in 2007 to 43.65 million metric 
tons in 2012 (http://www.fas.usda.gov/psdonline/psdreport. 
aspx?hidReportRetrievalName=BVS&hidReportRetrievalID= 
708&hidReportRetrievalTemplateID=8), the production of 
soybean oil is still insufficient to meet the consumer de- 
mand, and the low oil content in soybean seeds is the main 
factor restricting yield. Therefore, an increase in soybean 
oil yield is desirable and has been one of the most major 
goals of oilseed engineering. 

To increase the production of soybean oil, it is not 
feasible to extend the planting area of the crop, because 
of increased competition for land by the rapidly rising 
population. Therefore, it is a better strategy to increase 
the seed oil content than to increase the planting area 
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[1]. In recent years, molecular genetics approaches have 
been used to modify seed oil content. Over-expression 
of a diacylglycerol acyltransferase (AtDGAT) cDNA in 
wild-type A. thaliana enhanced oil deposition and 
average seed weight [2]. The research of Wang et al 
indicated that the oil content of soybean seeds could 
be increased by upregulation of two soybean Dof-type 
transcription factor (GmDof) genes, which are asso- 
ciated with fatty acid biosynthesis [3]. The soybean 
genome also has been examined with regard to the 
relationships between gene expression profiles and 
gene function. Although the biochemical pathways that 
produce different storage components are well charac- 
terized, there is still no integrated model describing 
differentially expressed genes (DEGs) involved in soy- 
bean lipid biosynthesis. With the development of 
high-throughput technologies, including the newly 
developed Solexa/Illumina RNA-sequencing and DEG 
high-throughput deep sequencing approaches, new 
genes have been discovered and specific transcripts ana- 
lyzed. In Severins research, RNA Seq- Atlas provided a 
record of high-resolution gene expression in a set of 
various tissues. They also found dramatic tissue-specific 
gene expression of both the most highly-expressed 
genes and the genes specific to legumes in seed devel- 
opment and nodule tissues [4]. Furthermore, these 
technologies are useful for estimating overall gene 
expressions at different development stages and/or in 
different tissues, such as on rice transcriptome and 
disease [5-7]. 

In this study, we used Illumina Solexa sequencing 
to investigate gene expression in soybean seeds at dif- 
ferent developmental stages (15, 35, 55, and 65 days 
after flowering; DAF), and then compared transcript 
reads with the most recent release of the G. max 
genome sequence (assembly Glymal.01). Illumina Solexa 
de novo sequencing technology identified a total of 
11592, 16594, and 16255 differentially expressed uni- 
genes between the 15 and 35 DAF seeds, the 15 and 55 
DAF seeds, and the 15 and 65 DAF seeds, respectively 
(Additional file 1: Tables SI, Additional file 2: Table S2, 
Additional file 3: Table: S3). Of these, 9905 unigenes 
existed in all three of these contrast groups and repre- 
sented co-expressed unigenes. Furthermore, 124 candi- 
date unigenes were screened from the three contrasting 
cDNA libraries and characterized as those responsible 
for lipid synthesis. Our researches provide the whole pic- 
ture of DEG transcription patterns and expression levels 
during soybean seed development. The elucidation of 
DEG transcription patterns at specific stages of seed de- 
velopment also lays the foundation for understanding 
the molecular mechanisms underlying oil production. 
This research provides direction for controlling genes 
over-expression to increase soybean oil content. 



Methods 

Plant culture and collection 

The soybean cultivar Jiyu-72 was used as the experi- 
mental material. The plants were grown in a green 
house with a 15-h light (200 uEm" 2 s" 1 , 25°C)/9-h dark 
photoperiod (23°C), with relative humidity controlled at 
75%. The developmental processes of soybean seeds 
from flowering to seed maturity were observed from 
July to October 2010. Pods were harvested at 15 DAF 
(immature stage), and then every 5 days until 70 DAF 
(pods containing mature seeds). After removing the 
seed coat, the seeds were used for oil extraction or 
frozen at -80°C until mRNA extraction and sequencing. 

Measurement of oil content 

To extract the oil (or lipid), seeds harvested at 15, 20, 
25, 30, 35, 40, 45, 50, 55, 60, 65, and 70 DAF were oven- 
dried to constant weight at 85°C. The dry samples were 
ground to a fine powder in a disintegrator, and the pow- 
der was transferred into 10-mL glass tubes for oil extrac- 
tion. Oil was extracted using ligarine to determine total 
lipids (TL) gravimetrically [8]. Oil extraction was per- 
formed using a SER148 3/6 Extraction apparatus (VELP 
Scientifica, Italy). Extractions were carried out using 
triplicate samples for each stage, and mean values were 
determined. Errors are shown as standard deviations. 

Total RNA extraction, library construction, and deep 
sequencing 

RNA was isolated from the seeds harvested at 15, 35, 55, 
and 65 DAF, respectively, using a TIANGEN RNA Prep 
Pure Plant kit (Tiangen Biotech Co. Ltd., Beijing, China) 
following the manufacturers protocol, followed by a 
chloroform extraction to improve RNA purity. The yield 
and quality of total RNA samples were determined using 
a ND-1000 NanoDrop spectrophotometer. For RNA li- 
brary construction and deep sequencing, equal quantities 
of RNA samples from seeds at the four developmental 
stages were pooled. Approximately 6 ug RNA representing 
each group was submitted to Solexa (now Illumina Inc.) 
for sequencing. 

Differential expression (DE) detection 

The raw data were filtered to remove adaptor reads, low 
quality reads, and reads of copy number = 1, yielding a 
dataset consisting of clean reads. For the tissue-specific 
analyses, raw digital gene expression data were normal- 
ized using a variation of the reads /Kb /million (RPKM) 
method [9,10]. The RPKM method corrects for biases in 
total gene exon size and normalizes for the total short 
read sequences obtained in each tissue library. Genes 
with data P-value < 0.005, false discovery rate (FDR) 
<0.001, and estimated absolute log 2 -fold change >1 in 
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sequence counts across libraries were considered to 
be significantly differentially expressed. Differentially 
expressed genes that were co-expressed during these four 
stages were then subjected to cluster analysis using the R 
program (a language and environment for statistical 
computing and graphics). Functional categorization of 
stress-regulated genes was performed using BGI WEGO 
(http://wego.genomics.org.cn/cgi-bin/wego/index.pl). 

Quantitative real-time PCR (qRT-PCR) analysis 

To verify the data obtained by Solexa RNA-seq, qRT- 
PCR was performed on 12 genes with log 2 ratios ranging 
from 2 to 11 (Additional file 4: Table S4). The RNA sam- 
ples used for the qRT-PCR assays were the same as 
those used for the DEG experiments. qRT-PCR was per- 
formed on a Mx3000P instrument (Stratagene), with 
SYBR-Green detection (Quant qRT-PCR Kit, Tiangen 
Biotechnology), according to the manufacturer's instruc- 
tions. The soybean tubulin gene was used as an internal 
control (forward primer: 5'-ATGTTCCGTGGCAAGA 
TGAG-3; reverse primer: 5'-CATTGTTGGGAATCCAC 
TC-3') [11]. Primers were designed using the Primer 
Premier 5.0 and Oligo 6 programs to amplify products ap- 
proximately 150-bp long. Thermal cycle conditions were 
as follows: 2 min at 95°C followed by 40 cycles of 15 s 
at 95°C, 15 s at 56-57°C, and 15 s at 72°C. Each cDNA 
was analyzed three times, after which the average thresh- 
old cycle (Ct) was calculated per sample. Standard curves 
were established for all investigated genes using a series 
of amplicon dilutions. The relative expression levels were 
calculated as T (ACt of 35 ' 55 ' 65 DAF ' ACt of 15 DAF) - 

Results 

Accumulation of oil at different stages of seed 
development 

To explore lipid accumulation during development of 
soybean seeds, we quantified lipid content in soybean 
seeds harvested from 15 to 70 DAF. As shown in 
Figure 1, the oil content began to increase from 15 
DAF, then markedly increased from 15 to 20 DAF, and 
then remained stable from 20 to 25 DAF before again 
increasing rapidly from 25 to 35 DAF. From 35 to 55 
DAF, the oil content showed a steady increase, peaking 
at 60 DAF before dropping at 65 DAF. These results 
indicated that for the soybean cultivar JiYu-72, 15 DAF 
marked the beginning of oil accumulation, 35 DAF was 
the key stage for the rapid increase in the oil content. 
55 DAF was the stage at which oil content was high 
and stable, and 65 DAF was the stage at which oil con- 
tent decreased and nearly unchanged. Therefore, in 
order to explore the differentially expressed genes asso- 
ciated with lipid biosynthesis during soybean seed de- 
velopment, the following sequencing and qRT-PCR 
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Figure 1 Changes in lipid content during seed development. 

Lipid content was determined every 5 days. One star stands for the 
discrepancy is significant and double stars stand for the discrepancy 
is extremely significant. 



analyses were performed using samples from these four 
different developmental stages. 



Screening of differentially expressed genes (DEGs) from 
massive datasets 

Initially, Solexa RNA-sequencing revealed a total of 11592, 
16594, 16255 differentially expressed unigenes between 35 
DAF and 15 DAF seeds, 55 DAF and 15 DAF seeds, and 
65 DAF and 15 DAF seeds, respectively (Table 1). Among 
them, 9905 unigenes were present in all three of these 
contrast groups, and 1687, 6689, and 6350 were differen- 
tially expressed only in the 35 vs 15 DAF, 55 vs 15 DAF, 
and 65 vs 15 DAF groups, respectively. From the large 
datasets, we found that down-regulated genes were more 
abundant than up-regulated genes during soybean seed 
development. The normalized data are shown as scatter 
plots of log 10 -transformed transcription counts for the 
four different samples (Additional file 5: Figure SI). 

There were 680, 1004, and 850 up-regulated unigenes 
and 10912, 15590, and 15405 down- regulated unigenes 
in the three respective groups (Table 1). Among the up- 
regulated unigenes, 352 were co-expressed genes show- 
ing increases of at least 2-fold (log 2 ratio > 1), and 269, 
258, and 65 of the up-regulated genes were specific to 
each of the three respective groups. Among the down- 
regulated unigenes, 9476 were co-expressed genes show- 
ing decreases of at least 2-fold (log 2 ratio < 1), and 672, 
1278, and 769 of the down-regulated genes were specific 
to each of the three respective groups (Figure 2). 

Differentially expressed genes provide clues about the 
molecular events related to seed development. Further 
investigation of highly expressed genes may be war- 
ranted to determine what functional roles the highly 
expressed sequences may play in soybean seeds. To 
understand the relationship between the time at which 
unigenes are co-expressed and their biological 
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Table 1 Differentially expressed genes between seeds of 35 DAF and 15 DAF, 55 DAF and 15 DAF, and 65 DAF and 
15 DAF 



Contrast groups No. of genes 





Total 


Co-expressed 


Distinct 


Up-regulated 


Down-regulated 




genes 


genes 


genes 


Known 3 


Unknown 13 


Known 3 


Unknown b 


Seeds of 35DAF-VS-15DAF 


11592 


9905 


1687 


419 


261 


7314 


3598 


Seeds of 55DAF-VS-15DAF 


16594 


9905 


6689 


646 


358 


10203 


5378 


Seeds of 65DAF-VS-15DAF 


16255 


9905 


6350 


529 


321 


10246 


5159 



'known 3 ' represents genes with Gene Ontology annotation. 
'unknown 13 ' represents genes without Gene Ontology annotation. 



significance, we carried out Gene Ontology enrichment 
analysis. 

Gene ontology category analysis and DEGs in each 
category 

As a useful tool for gene functional annotation, WEGO 
(Web Gene Ontology Annotation Plot) has been widely 
used in many important studies, including the rice gen- 
ome project and the silkworm genome project [12-14]. 
It has become one of the most commonly used tools for 
downstream gene annotation analysis, especially in com- 
parative genomics studies. In this research, WEGO ana- 
lysis assigned the DEGs (FDR < 0.001 and |log 2 ratio| > 1) 
to three functional categories; Cellular Component, Mo- 
lecular Function, or Biological Process. A total of 11963 
unigenes had at least one GO functional category. In the 
three contrast groups, all differentially expressed genes 
with a Gene Ontology annotation were further classified 
into subsets. There were 11 subsets within the Cellular 
Component category, 10 subsets within the Molecular 
Function category, and 23 subsets within the Biological 
Process category (Figure 3). Thus, the most abundant 
unigenes were related to binding and catalytic activity in 
the Molecular Function category, cell, cell parts and 
organelles in the Cellular Component category, and cel- 
lular and metabolic in the Biological Process category. 



The results showed that these types of genes were highly 
enriched in our soybean transcriptomes. In terms of the 
categories, the three contrast groups showed similar 
patterns. 

Because we are interested in lipid biosynthesis, we 
focused on genes associated with this process that were 
differentially expressed during seed development. From 
the Component Ontology data, eight differentially co- 
expressed lipid particle unigenes were identified. Among 
them, seven unigenes were up-regulated, and one 
unigene was down- regulated during seed development. 
From the Function Ontology data, unigenes associated 
with lipid biosynthesis could be further classified into 
three categories: those with lipid kinase activity (9 differ- 
entially co-expressed down- regulated unigenes); those 
with fatty acid ligase activity (6 differentially co-expressed 
down-regulated unigenes); and those with fatty acid syn- 
thase activity (18 differentially co-expressed unigenes, 4 of 
which were up-regulated and 14 down-regulated). From 
the Process Ontology data, unigenes associated with lipid 
biosynthesis were in two categories; lipid biosynthetic 
process (78 differentially co-expressed unigenes, 5 of 
which were up-regulated, and 73 down-regulated), and 
fatty acid biosynthetic process (49 differentially co- 
expressed unigenes, 4 of which were up-regulated, and 45 
down-regulated). Interesting, the 49 co-expressed 
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Figure 3 Functional categorization of significantly differentially expressed genes during seed development. Functional categorization 
was performed using BGI WEGO (Web Gene Ontology Annotation Plot), (a). The functional categorization of differentially expressed genes 
between 35 DAF and 15 DAF. (b). The functional categorization of differentially expressed genes between 55 DAF and 15 DAF. (c). The functional 
categorization of differentially expressed genes between 65 DAF and 15 DAF. 
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Table 2 DEGs identity, expression ratios, and Gene Ontology annotations 





Unigene ID 


Gene ID 


Accession 


Annotation 




log 2 ratio 










No 




35 DAF 


55 DAF 


65 DAF 


gnl 


|UG|Gma#S1 5927547 


31463931 


CD405959 


Catalytic activity, mRNA sequence 


-12.45 


-12.45 


-12.45 


gnl 


|UG|Gma#S1 5936780 


31473219 


CD415247 


Gm_ck5311 Soybean induced by 

SdllCyllC aCICI vJ. mux LUWrK o 


-1.52 


-5.95 


-13.86 


gnl 


ll \t~ lTivn#Cl 1 C2"7/l/in 

|U(j|(jma#bz iDo/4ou 




Luy/yo4o 


Nucleoside-triphosphatase activity 


— i .oy 


"i f\f\ 

— z.UU 


— 1 .2.1 


gnl 


li irlr»wi#CTi cconoo 
|Uvj|vjma#bzl5ooUo9 


51 344862 


c_Uyo55y5 


Intracellular membrane-bounded organelle 


—1 .85 


—14.73 


—4.59 


gnl 


|UVJ|VjlTldTfJZ 1 jOOZ?U 


Cl '3/1/170'? 
D 1 jTr/Uj 




vjivioyuz't d i mu j.r i om _ r i uoy u. mux clmnh 


— £.. 1 O 


— z.zu 


3 1C 
—j. 1 D 


gnl 


ll \f \C »v»-»#CT3n/£'2/l/£0 

|UCj|oma?fbzoUDo4Dz 




rv7n5Ti a 
La/Uoz I O 


Water stressed 48 h segment 2 gmrtDrNSOl 
G. max cDNA 3' 


1 Q1 

— 1 .0 1 


— i .oy 


— 1 ./O 


gnl 


|UG|Gma#S23071366 


58024339 


CX711080 


Oxidoreductase activity 


-8.01 


-11.12 


-11.05 


gnl 


|UG|Gma#S39303981 


151397976 


EV267849 


Carbon-carbon lyase activity 


-6.22 


-5.33 


-13.86 


gnl 


|uvj|vjmaffo.5y.5u*ozD 


1 CI "3QQCT5 




C\ MACBATE \C\I\ CnV") m/1v r nMA C 
ULIVIAOoOl r JCVIOC/YZ U. iHUX CUInM j 


a e^c\ 
— *f.OU 


*5 3Q 


— Z.jZ 


gnl 


|Uvj|vjmaffoo7o lUo// 


i ci /incvvo 

1 J l^fUD//y 






"5 fi"5 
— Z.OZ 


1 fiQ 
— 1 .Oo 


— Z.U*! 


gnl 


ll inlf^maiK'SQCAftOQft 

|uu|vjrnaff jjyj'iozyo 


1 C'5Q'5ft'3'31 


CI \(\(\OC-7fi 
LUUUjj/D 


Oxidoreductase activity 


fik Oft 
— O.Uo 


i n an 
— i u.ou 


— 1 u.zz 


gnl 


|uvj|vjma?Fo^fDDoD iuy 


i yzou i uuy 


rvjyyu i uy 


r: 1 1 nilZlTP \C\I\ COV1 r, mnv /-nM A C 1 

vjllu i i r jcvi-jC/t i u. max cuinm d 


1 fin 

— 1 .ou 


1 Q7 

— i .y/ 


Til 

— Z. 1 O 


gnl 


ll ltZ\tZrv\^UQA^.^.A!lQ~71 

|uo|urnaff 5'ioj'ioy/ i 


i y/j i z/*to 


ri\uuoy/z 


oLiviuuyzirx jlvi-juiz C7. max clmnh d 


1 "53 
— 1 .ZJ 


1 H7 

— 1 .u/ 


1 C7 
— 1 .3/ 


gnl 


ll lfZl<'Zm:»:ttC/lCCC'2'2'21 
| U vJ | vJITIuTr J^r J J J J J J 1 


1 QT31 

i y/j i o/jd 


Fk'nnft'?'?'? 

rlMJUOO jZ 


Oxidoreductase activity 


Zl 3ft 


"5 fi*5 

— z.oz 


ft ni 

— O.U 1 


gnl 


ll lfZ\tZm^#QAA.Z.n~JQAQ 

|Uo|vjmaffo*foDU/yoy 


1 03 300/1 1 C 

i yo.5yy*f i d 




Soybean Seeds Containing Globular-Stage 
Embryos G. max cDNA, mRNA sequence 


"5 QC 

— z.y_> 


/I 


1 Zl /Ifi 


gnl 


|UG|Gma#S46512605 


193402683 


FK365941 


Oxidoreductase activity 


-3.98 


-4.68 


-5.61 


gnl 


|UG|Gma#S47453769 


208298984 


GE091511 


Oxidoreductase activity, mRNA sequence 


-3.28 


-14.04 


-14.04 


gnl 


|UG|Gma#S47698257 


209701642 


BW673128 


Oxidoreductase activity, mRNA sequence 


-3.89 


-13.83 


-4.34 


gnl 


|UG|Gma#S48312113 


210145949 


AK244640 


Oxidoreductase activity 


-2.54 


-4.24 


-2.95 


gnl 


|UG|Gma#S48312226 


210145836 


AK244527 


Oxidoreductase activity 


-1.22 


-2.76 


-2.62 


gnl 


|UG|Gma#S48312292 


210145770 


AK244461 


G. max cDNA 


-5.08 


-14.02 


-5.71 


gnl 


|UG|Gma#S48312360 


210145702 


AK244393 


G. max cDNA 


-7.45 


-10.67 


-9.92 


gnl 


|UG|Gma#S48312476 


210145586 


AK244277 


G. max cDNA 


-2.29 


-2.99 


-2.62 


gnl 


|UG|Gma#S48312873 


210145189 


AK243880 


G. max cDNA 


-5.00 


-7.34 


-7.85 


gnl 


|UG|Gma#S48313629 


210144071 


AK286853 


Oxidoreductase activity 


-5.01 


-6.81 


-6.93 


gnl 


|UG|Gma#S48313665 


210144035 


AK286817 


Oxidoreductase activity 


-1.60 


-2.61 


-2.20 


gnl 


|UG|Gma#S48314836 


210142864 


AK285646 


Oxidoreductase activity 


-5.68 


-10.31 


-10.25 


gnl 


|UG|Gma#S48315365 


210142183 


AK246102 


G. max cDNA 


-12.13 


-2.15 


-2.09 


gnl 


|UG|Gma#S48315795 


210141753 


AK245672 


Threonine kinase activity 


-1.11 


-1.38 


-1.25 


gnl 


|UG|Gma#S48315845 


210141703 


AK245622 


Fatty acid ligase activity 


-4.52 


-5.95 


-5.30 


gnl 


|UG|Gma#S48316028 


210141520 


AK245439 


Fatty acid synthase activity 


-2.84 


-6.12 


-6.05 


gnl 


|UG|Gma#S48316351 


210141197 


AK245114 


Fatty acid synthase activity 


-2.05 


-11.17 


-2.90 


gnl 


|UG|Gma#S48316398 


210141150 


AK245067 


G. max cDNA 


-2.16 


-3.24 


-3.39 


gnl 


|UG|Gma#S48316494 


210141054 


AK244973 


Oxidoreductase activity 


-4.06 


-3.80 


-3.91 


gnl 


|UG|Gma#S48316776 


210140772 


AK244691 


G. max cDNA 


-1.54 


-2.50 


-2.62 


gnl 


|UG|Gma#S48540273 


213602571 


DB978070 


Transferase activity, mRNA sequence 


-11.98 


-11.98 


-11.98 


gnl 


|UG|Gma#S4876535 


6070118 


AW099690 


Oxidoreductase activity 


-12.38 


-12.38 


-12.38 


gnl 


|UG|Gma#S4928292 


7924555 


AW830581 


G. max cDNA clone 


-1.09 


-1.79 


-1.06 


gnl 


|UG|Gma#S4989750 


9983486 


BE657594 


oxidoreductase activity, mRNA sequence 


-2.95 


-12.36 


-12.36 


gnl 


|UG|Gma#S4991939 


10252627 


BE820393 


fatty acid ligase activity 


-1.44 


-3.87 


-3.22 


gnl 


|UG|Gma#S5146402 


505137 


D 13949 


G. max lipoxygenase-2 mRNA 


8.72 


7.41 


8.23 
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Table 2 DEGs identity, expression ratios, and Gene Ontology annotations (Continued) 



gnl 


| <JVJ|VJII1cIttO D 1 40J ID 


2739009 




yj . iiiuA i_y luv.hi uiiitr rtju i iiuiiuvJAyytri idoc 


_2 


1 5 


—3.03 


—3.1 1 


gnl 


UVJ IVJIIldt/OJ I4DO 1 / 


ZZ/U77.3 




\j, mux v_a i z uiiiyuiiiy cr iidiiu pruicin 


2 


/ D 
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Table 2 DEGs identity, expression ratios, and Gene Ontology annotations (Continued) 
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unigenes detected in the fatty acid biosynthetic process 
were all included in unigenes detected in the lipid biosyn- 
thetic process. Among the 113 unigenes analyzed by Gene 
Ontology, 15 co-expressed unigenes that showed high 
levels of expression (log 2 fold values > 1) could be respon- 
sible for lipid biosynthesis (Table 2). 

Annotation of lipid-relevant genes in fatty acid pathway 

Genes usually interact with each other to carry out 
certain biological functions. Knowledge of the expres- 
sions of multiple genes and their regulation during oil 
biosynthesis is required to further understand the regu- 
latory mechanisms controlling oil metabolism. Pathway- 



based analysis helps to clarify the biological functions of 
genes, and identifies significantly enriched metabolic 
pathways or signal transduction pathways associated 
with the DEGs compared with the whole genome back- 
ground. For our research, 124 biological pathways, in- 
cluding the starch and sucrose metabolism pathway, the 
citrate cycle pathway, the fatty acid biosynthesis path- 
way, and many others were identified by KEGG pathway 
analysis of unigenes. A total of 4836, 6796, and 6758 
DEGs with pathway annotations were identified in the 
three respective contrast groups. From those pathways, 
we selected the fatty acid biosynthesis pathway (Figure 4) 
for deep analysis. In the three contrast groups, 24 DEGs 
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Figure 4 Fatty acid pathway in soybean. Processes in pink rectangles are shown in abbreviated form. Red or green rectangles indicate 
up-regulated or down-regulated genes, respectively. MSN, Fas: fatty acid synthase; FAS1, 2: fatty acid synthase subunit beta, alpha; FabA: 
3-hydroxydecanoyl-[acyl-carrier-protein] dehydratase; FabB, FabL, FabV\: 3-oxoacyl-[acyl-carrier-protein] synthase I, II, III; FabD: [acyl-carrier-protein] 
S-malonyltransferase; FabG: 3-oxoacyl-[acyl-carrier protein] reductase; Fab\, FabK, FabL: enoyHacyl-carrier protein] reductase I, II, III; FabZ: 
3R-hydroxymyristoyl ACP dehydrase; MdcH: malonate decarboxylase epsilon subunit; FatA, FatB: fatty acyl-ACP thioesterase A, B. 



associated with the fatty acid pathway were co- 
expressed (Table 3). Interestingly, 13 of the identified 
unigenes were the same as those identified from the 
GO analysis. Among those 24 DGEs, only four ungenes 
(BT098182, AK245394, BT093926, BT093412) were up- 
regulated, which are encoding 3-oxoacyl-[acyl-carrier 
protein] reductases (FabG) and were annotated as hav- 
ing fatty acid synthase activity. The other 20 unigenes 
were down-regulated, which may be the negatively con- 
trolled genes in the fatty acid biosynthesis pathway. 
Figure 4 shows the locations of the differentially co- 
expressed unigenes in the fatty acid pathway. Those 
shown in red and green represent genes that were up- 
regulated and down-regulated, respectively, during seed 
development. 



Clustering of candidate genes 

A total of 124 unigenes, including those detected by GO 
and pathway analyses, were selected for clustering ana- 
lysis. All of these unigenes showed different expression 
profiles. The genes were clustered according to the simi- 
larity of their expression patterns. A subset of the data is 
shown in Figure 5. The heatmap of the RPKM normal- 
ized log 2 -transformed transcription count was generated 
using R programs. Up-regulated unigenes are shown in 
red, while down regulated ones are shown in green. This 
analysis allowed us to define common qualitative pat- 
terns in gene expression changes over time. Our results 
suggested that only a few genes were expressed differ- 
entially in the three contrast groups, and that the ma- 
jority of the transcriptome had approximately similar 
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Table 3 DEGs involved in fatty acid pathway 
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Fatty acid synthase activity 
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G. max beta-ketoacyl-acyl carrier 
protein synthase III mRNA 
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-2.05 


-2.09 


gnl|UG|Gma#S23063462 


58016474 


CX703216 


Water-stressed 48 h segment 2 gmrtDrNSOl 


-1.81 


-1.39 


-1.76 



G. max cDNA 3' 



expression levels. In this manner, similar patterns were 
clustered together in the same manner as a taxonomic 
tree. Therefore, we can speculate on the roles of uni- 
genes of unknown function by comparison to the anno- 
tated genes with known functions in the same cluster. 

Confirmation of differentially expressed genes by qRT-PCR 

To confirm the expression patterns determined by 
Solexa RNA-sequencing analysis, we used qRT-PCR ana- 
lyses to analyze expressions of 12 candidate genes 
(Additional file 4: Table S4). The soybean tubulin gene 
was used as an internal control Although the Solexa 
log 2 -fold values showed slight variations compared 
with the corresponding values from the qRT-PCR ana- 
lyses, the expression data from Solexa RNA-seq ana- 
lysis was largely consistent with those obtained by 
qRT-PCR (Additional file 6: Figure S2). 



Discussion 

Lipid synthesis depends on the correct spatial and tem- 
poral activity of many gene products. These genes exe- 
cute their function in three stages: fatty acid synthesis in 
the plastid, triacylglycerol (TAG) synthesis in the endo- 
plasmic reticulum (ER), and assembly into an oil body. 
Significant improvements in oil accumulation must be 
accompanied by changes in activity of the genes involved 
in fatty acid biosynthesis in developing seeds. Identifica- 
tion of these genes and their regulatory pathway would 
provide not only new genetic information for under- 
standing soybean seed development, but also for con- 
trolling gene expression in developing seeds to alter oil 
accumulation. 

This study has provided a new data set identifying the 
expression of DEGs during oil accumulation in develop- 
ing seeds. Massive parallel sequencing identified 11592, 
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Figure 5 Expression changes and cluster analysis of genes (124 in total) that were differentially expressed between 35 DAF and 15 
DAF seeds (control), 55 DAF and 15 DAF seeds, and 65 DAF and 15 DAF seeds. Cluster analysis of genes was performed using R program. 
Rows represent differentially expressed genes, columns represent different contrast groups. Red, green, and black boxes represent genes showing 
increased, decreased, or equal expression levels, respectively. Color saturation reflects magnitude of log 2 expression ratio for each gene. 



16594 and 16255 differentially expressed unigenes from 
three contrasting libraries covering four stages of seed 
development. We examined gene expression levels in 
detail, and found significant differences among the four 
growth stages. Among the differentially expressed genes, 
more were down- regulated than up-regulated, and some 
showed a differential expression pattern in all three con- 
trast groups, indicating that there were overlaps at the 
transcriptional level. The fact that there were many 
down-regulated genes indicates that there are more 
negatively regulated genes than positively regulated ones 
with functions in the fatty acid pathway. However, it 
does not mean that lower expression of these genes 
leads to lower oil contents, because of the complexity of 
the lipid biosynthetic pathway. To identify the genes 
associated with lipid biosynthesis, we determined gene 



expression patterns at specific stages of seed develop- 
ment, and conducted a GO analysis. To explore the 
genes with unknown functions, the expression patterns 
of 124 unigenes were analyzed by hierarchical clustering 
according to similarities in expression profiles across all 
conditions. 

In the lipid biosynthetic pathway, acetyl CoA carboxyl- 
ase (ACCase) plays an important regulatory role. 
ACCase catalyzes the carboxylation of acetyl- CoA to 
malonyl-CoA [15], which is then transacylated by 
malonyl-CoA-acyl carrier protein transacylase (FabD or 
MCAT,[EC 2.3.1.39]) to the acyl carrier protein (ACP), 
forming malonyl ACP. The latter adds a 2-carbon acetyl 
unit to the nascent or growing fatty acyl chain. The basic 
reaction of fatty acid synthesis is to combine molecules 
composed of two carbon units into longer chains to 
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form fatty acids. The initial condensation of 2-carbon 
units is catalyzed by p-ketoacyl-ACP synthase III (KASUI, 
EC2.3.1.180) which uses acetyl-CoA and malonyl-ACP as 
substrates. After the two reductions and dehydration 
reactions, a 4-carbon fatty acid, butyrate, is produced. 
This is poorly condensed by KASUI but is a good sub- 
strate for KAS1, which elongates 4-carbon chains to 14- 
carbon chains. 

In this study, we studied in detail the genes with key 
roles in lipid biosynthesis. Very frequently, the same en- 
zymatic function is redundantly encoded by several uni- 
genes. This may be the result of different proteins 
referenced with the same EC number or it may represent 
different transcripts encoding specific enzyme subunits. 
This situation was significant in the present study. Most 
plants have two forms of ACCase, the homomeric form in 
the cytosol, composed of a single large polypeptide catalyz- 
ing the individual carboxylation reactions [16], and the het- 
eromeric form in plastids, composed of four subunits; 
biotin carboxylase (BC) [17], biotin carboxyl carrier protein 
(BCCP) [18], a-carboxyltransferase (a-CT) [19] and p- 
carboxyltransferase (p-CT) [20]. In the present study, seven 
co-expressed unigenes (GenBank accession nos: BT094733, 
AK245399, U40979, AF165159, CX703216, BM188175, 
AW830581) encoding ACCases were identified as DEGs in 
the three contrast groups. All of them were down- 
regulated (log 2 ratio < 1) in the three contrast groups ex- 
cept one gene (AK245356) encoding an ACCase was up- 
regulated (log 2 ratio = 1.4) at 35 DAF. Three co-expressed 
unigenes (GenBank accession nos: BM188175, CX703216, 
AW830581, EC6.3.4.14) that are associated with the trans- 
formation of BCCP to carboxybiotin-carboxyl-carrier pro- 
tein were down-regulated during seed development. 

The next enzyme in fatty acid pathway is FabD, 
which catalyzes the transfer of malonyl-CoA to the holo 
acyl carrier protein (ACP), generating malonyl-ACP 
[21]. In the present study, the co-expressed unigene 
(AW34878) encoding FabD was down-regulated during 
seed development. 

KASUI (FabH) catalyzes the subsequent condensation 
and transacylation of acetyl- Co A with malonyl-ACP and 
has a universal role in fatty acid biosynthesis. Transgenic 
B. napus seeds overexpressing KASUI driven by napin 
also contained lower oil levels compared to what was 
found in the wild- type [22]. In the present study, the co- 
expressed unigene (AF260565) encoding KASUI was sig- 
nificantly down-regulated in the fatty acid pathway with 
a log 2 ratio of -1.10, -2.05, and -2.09 in the three con- 
trast groups. So compared to 15 DAF, the differentially 
expressed levels of the gene encoding KASUI at 35DAF, 
55DAF, 65DAF negatively correlated to fatty acid synthe- 
sis during the seed development, which is consistent 
with previous research [23]. The same down- regulated 
patterns were also observed for FabY (AF243183, 



CX702997, AK244605, AK285347), FabZ (BT098415), 
Fabl (BI970856), FatB (AK244101) and oleoyl-[acyl-car- 
rier-protein] hydrolase (EC 3.1.2.14, AK244101). 

In most plant tissues, Acyl-ACP thioesterase is the 
major determinant of chain length and level of saturated 
fatty acids [24]. It plays an important role by influencing 
the fatty acid composition of the produced oil and then 
mainly the ratio of 16 C to 18 C fatty acids and the level 
of saturated fatty acid [25]. Two distinct but related 
thioesterase gene classes exist in higher plants: FatA is 
an acyl-specific thioesterase, with specificity for 18:1> 
> 18:0> > 16:0 fatty acids [26], which is considered 
an essential "housekeeping" enzyme in all plant cells; 
FatB is a thioesterase, which shows specificity for 
16:0 > 18:1 > 18:0 fatty acids [27]. In this study, Solexa 
sequencing analysis showed that the expression of gene 
(CX706542) encoding FatA was unchanged at 35 DAF, 
but down-regulated at 55 DAF and 65 DAF with log 2 ra- 
tios of -1.7 and -1.5. While differentially co-expressed 
unigene (AK244101) encoding FatB showed constant 
decreased expressed levels with log 2 ratios of -1.4, -2.4, 
-2.7, respectively. Therefore, the expression level varia- 
tions of CX706542 and AK244101 would influence the 
16 and 18 carbon fatty acids synthesis, which needs to 
be confirmed by experiment. 

Soybean seeds contain three lipoxygenase isozymes; 
lipoxygenases 1, 2, and 3. Lipoxygenases (linoleate: 
oxygen oxidoreductase; EC 1.13.11.12) catalyze the 
oxidation of unsaturated fatty acids to produce conju- 
gated unsaturated fatty acid hydroperoxides, which are 
converted to volatile compounds associated with un- 
desirable flavors [28]. Eliminating this enzyme from 
seeds could lead to better quality soybean protein and 
oil products. Three co-expressed lipoxygenase genes 
(X67304, U50081, D13949) identified in this study 
were among those with the highest expression levels. 
To our knowledge, this is the first report of a close 
correlation between lipoxygenase expression and fatty 
acid accumulation. 

The P450 family is a large and diverse group of 
isozymes that mediate a diverse array of oxidative 
reactions. The activities of most of these enzymes are 
associated with biosynthetic processes such as phenyl- 
propanoid, terpenoid, and fatty acid biosyntheses. Ten 
alkane-inducible P450 genes from Candida tropicalis 
(ATCC20336), which were responsible for omega- 
hydroxylation of n-alkanes and fatty acids, were cloned 
[29]. In their research, these enzymes were believed to 
be at least in part limiting in the conversion of fatty 
acid to diacids, but their relative oxidative activity to- 
ward other fatty acids was not known. The two uni- 
genes (AF022464, DQ340246) encoding soybean P450 
monooxygenase were identified in this research. Both 
of them showed down- regulated expressions during 
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seed development, which indicated that they are nega- 
tively correlated to the fatty acid accumulation. For the 
other differentially co-expressed unigenes shown in 
Table 2, there is little information available about their 
relationship with lipid accumulation. Their log 2 ratios 
indicate that they have significant functions in regulat- 
ing soybean seed development. However, our results 
cannot validate a direct relationship between these uni- 
genes and oil accumulation. This topic requires further 
research. 

Because we are interested in oil production in soy- 
beans, we selected the fatty acid biosynthesis pathway 
for deep analysis from among the 124 biochemical path- 
ways identified by Solexa sequencing. Twenty four co- 
expressed unigenes in the fatty acid pathway, including 
13 that overlapped with the unigenes identified in the 
GO analysis, showed significant correlations with fatty 
acid accumulation (Table 3). 

The up-regulated genes that were significantly corre- 
lated with fatty acid accumulation included ACCase and 
lipoxygenase. The down-regulated genes that were sig- 
nificantly correlated with fatty acid accumulation 
included FabD, KAS III, FabF, FabZ, Fabl, FatB, FatA, 
P450, and oleoyl-[acyl-carrier-protein] hydrolase genes. 
FabG genes were both up- and down- regulated during 
seed development. 

The analysis of the genes involved in the fatty acid 
biosynthetic pathway provides a basis to identify key 
regulatory processes controlling oil accumulation in 
soybean. However, biosynthetic pathways involve the 
cooperation of multiple genes. It is difficult to in- 
crease seed oil content by overexpressing a single 
gene. The large-scale characterization of unigenes 
described in this study shows comprehensive correla- 
tions between DEGs and fatty acid accumulation in 
soybean. 

Conclusion 

In this study, 11592, 16594, and 16255 differentially 
expressed unigenes were identified at 35, 55, and 
65 days after flowering of soybean, respectively. Gene 
Ontology analyses detected 113 co-expressed unigenes 
associated with lipid biosynthesis. Of these, 15 showed 
significant changes in expression levels (log 2 fold 
values > 1) during seed development. Pathway analysis 
revealed 24 co-expressed transcripts involved in lipid 
biosynthesis and fatty acid biosynthesis pathways. These 
results provide a comprehensive molecular biology 
background for research on soybean seed development, 
particularly with respect to the process of oil accumula- 
tion. All of the genes identified in our research have 
significance for breeding soybeans with increased oil 
contents. 
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Additional file 1: Table SI. Differentially expressed genes between 35 
DAF and 15 DAF (FDR <0.001 and |log 2 ratio| >1). 

Additional file 2: Table S2. Differentially expressed genes between 55 
DAF and 15 DAF (FDR <0.001 and |log 2 ratio| >1). 

Additional file 3: Table S3. Differentially expressed genes between 65 
DAF and 15 DAF (FDR <0.001 and |log 2 ratio| >1). 

Additional file 4: Table S4. Details of primers used for qRT-PCR. 

Additional file 5: Figure SI. Scatter plot of differentially expressed 
genes in seeds harvested at 35 DAF, 55 DAF, and 65 DAF, compared with 
15 DAF. (a). Scatter plot of differentially expressed genes between 35 DAF 
and 15 DAF. (b). Scatter plot of differentially expressed genes between 55 
DAF and 15 DAF. (c). Scatter plot of differentially expressed genes 
between 65 DAF and 15 DAF. TPM = Transcript per million (normalized 
expression level of genes). 

Additional file 6: Figure S2. Confirmation of differential gene 
expression by qRT-PCR. 
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